A Rough Set Approach to Classifying Web Page Without Negative Examples
نویسندگان
چکیده
This paper studies the problem of building Web page classifiers using positive and unlabeled examples, and proposes a more principled technique to solving the problem based on tolerance rough set and Support Vector Machine (SVM). It uses tolerance classes to approximate concepts existed in Web pages and enrich the representation of Web pages, draws an initial approximation of negative example. It then iteratively runs SVM to build classifier which maximizes margins to progressively improve the approximation of negative example. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. Experimental results show that the novel method outperforms existing methods significantly.
منابع مشابه
Approach for Dimensionality Reduction in Web Page Classification
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant te...
متن کاملHeterogeneous Learner for Web Page Classification
Classification of an interesting class of Web pages (e.g., personal homepages, resume pages) has been an interesting problem. Typical machine learning algorithms for this problem require two classes of data for training: positive and negative training examples. However, in application to Web page classification, gathering an unbiased sample of negative examples appears to be difficult. We propo...
متن کاملA Rough Set-Aided System for Sorting WWW Bookmarks
Most people store `bookmarks' to web pages. These allow the user to return to a web page later on, without having to remember the exact URI/URL address. People attempt to organise their bookmark databases by ling bookmarks under categories, themselves arranged in a hierarchical fashion. As the maintainence of such large repositories is diÆcult and time-consuming, a tool that automatically categ...
متن کاملFeature Selection with Rough Sets for Web Page Classification
Web page classification is the problem of assigning predefined categories to web pages. A challenge in web page classification is how to deal with the high dimensionality of the feature space. We present a feature reduction method based on the rough set theory and investigate the effectiveness of the rough set feature selection method on web page classification. Our experiments indicate that ro...
متن کاملThe naive Bayes text classification algorithm based on rough set in the cloud platform
This paper improves the naïve bayesian classification algorithm , combining with the rough set theory we can get a naive bayesian classifier algorithm based on the rough set. We implement this algorithm on a cloud platform using map-reduce programming mode and get a excellent result. A recall rate of 76.4 was achieved when classifying Tibetan Web pages .
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007